Goto

Collaborating Authors

 video show



Watch: Skier tells BBC of 'panic' as avalanche hit Italian slopes

BBC News

Watch: Skier tells BBC of'panic' as avalanche hit Italian slopes A skier filmed the moment an avalanche struck a mountain valley in northern Italy on Tuesday. The footage, filmed by Siobhan Halford, shows a group of people, including children, in a queue, covered in snow after being hit by the aftermath of an avalanche in Courmayeur. Speaking to the BBC News Channel, Halford, who is from Billericay, Essex, described how the moment unfolded. We couldn't see, it was hard to breathe. There was a lot of panic, she said.


Jurassic Coast rockfall captured on video

BBC News

A visitor has called it a miracle no-one was hurt when a section of cliff collapsed on to a beach on Dorset's Jurassic Coast. Suzanne Sears, from Hemel Hempstead in Hertfordshire, was taking a walk near West Bay when she heard a deep cracking noise coming from the cliffs before the rockfall shortly after 16:00 GMT on Tuesday. The Maritime and Coastguard Agency confirmed a rescue team was sent to a report of a cliff fall at West Bay and no one was found to be in distress. A woman was forced to run to safety as the Dorset cliff collapsed on Saturday. The kayakers spotted the creature after hearing it exhaling loudly off Portland Castle beach.


Giant purple dinosaur caught fly-tipping on CCTV

BBC News

A fly-tipper dressed as a giant purple T. rex has been caught on camera dumping rubbish in a street. The brightly coloured rogue raptor was spotted checking for traffic before crossing a road in Southend, Essex. The prehistoric predator then looks around before slinging two black bin bags to the ground next to large black bin. Footage of the incident, first reported by Your Southend, was captured on a resident's CCTV just before 21:30 GMT on Tuesday. The city council told the BBC it had not received any reports of fly-tipping in relation to the incident.



CAViAR: Critic-Augmented Video Agentic Reasoning

Menon, Sachit, Iscen, Ahmet, Nagrani, Arsha, Weyand, Tobias, Vondrick, Carl, Schmid, Cordelia

arXiv.org Artificial Intelligence

Video understanding has seen significant progress in recent years, with models' performance on perception from short clips continuing to rise. Yet, multiple recent benchmarks, such as LVBench, Neptune, and ActivityNet-RTL, show performance wanes for tasks requiring complex reasoning on videos as queries grow more complex and videos grow longer. In this work, we ask: can existing perception capabilities be leveraged to successfully perform more complex video reasoning? In particular, we develop a large language model agent given access to video modules as subagents or tools. Rather than following a fixed procedure to solve queries as in previous work such as Visual Programming, ViperGPT, and MoReVQA, the agent uses the results of each call to a module to determine subsequent steps. Inspired by work in the textual reasoning domain, we introduce a critic to distinguish between instances of successful and unsuccessful sequences from the agent. We show that the combination of our agent and critic achieve strong performance on the previously-mentioned datasets.


All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark

Testa, Davide, Bonetta, Giovanni, Bernardi, Raffaella, Bondielli, Alessandro, Lenci, Alessandro, Miaschi, Alessio, Passaro, Lucia, Magnini, Bernardo

arXiv.org Artificial Intelligence

We introduce MAIA (Multimodal AI Assessment), a native-Italian benchmark designed for fine-grained investigation of the reasoning abilities of visual language models on videos. MAIA differs from other available video benchmarks for its design, its reasoning categories, the metric it uses and the language and culture of the videos. It evaluates Vision Language Models (VLMs) on two aligned tasks: a visual statement verification task, and an open-ended visual question-answering task, both on the same set of video-related questions. It considers twelve reasoning categories that aim to disentangle language and vision relations by highlight when one of two alone encodes sufficient information to solve the tasks, when they are both needed and when the full richness of the short video is essential instead of just a part of it. Thanks to its carefully taught design, it evaluates VLMs' consistency and visually grounded natural language comprehension and generation simultaneously through an aggregated metric. Last but not least, the video collection has been carefully selected to reflect the Italian culture and the language data are produced by native-speakers.


Ukraine claims drone strike on Russian oil refinery

BBC News

Andriy Kovalenko, head of Ukraine's centre for countering disinformation, said on Telegram that an oil refinery in Ryazan had been hit, as well as the Kremniy factory in Bryansk that Kyiv says produces missile components and other weapons. Bloggers on Telegram posted images and videos of fires raging at the Ryazan facility, which covers around 6sq km (2.3sq miles). Verified footage shows people fleeing from the site in cars and on foot as a fireball rises into the sky. BBC Verify used video footage to establish the location of two fires at the refinery. One video shows a fire near the northern entrance, whose location was matched by the road layout, signs and fences.


Pegasus-v1 Technical Report

Jung, Raehyuk, Go, Hyojun, Yi, Jaehyuk, Jang, Jiho, Kim, Daniel, Suh, Jay, Lee, Aiden, Han, Cooper, Lee, Jae, Kim, Jeff, Kim, Jin-Young, Kim, Junwan, Park, Kyle, Lee, Lucas, Ha, Mars, Seo, Minjoon, Jo, Abraham, Park, Ed, Kianinejad, Hassan, Kim, SJ, Moon, Tony, Jeong, Wade, Popescu, Andrei, Kim, Esther, Yoon, EK, Heo, Genie, Choi, Henry, Kang, Jenna, Han, Kevin, Seo, Noah, Nguyen, Sunny, Won, Ryan, Park, Yeonhoo, Giuliani, Anthony, Chung, Dave, Yoon, Hans, Le, James, Ahn, Jenny, Lee, June, Saini, Maninder, Sanders, Meredith, Lee, Soyoung, Kim, Sue, Couture, Travis

arXiv.org Artificial Intelligence

This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's architecture, training strategies, and its performance in benchmarks on video conversation, zero-shot video question answering, and video summarization. We also explore qualitative characteristics of Pegasus-1 , demonstrating its capabilities as well as its limitations, in order to provide readers a balanced view of its current state and its future direction.


AI popstar Anna Indiana is ridiculed for her first single - so, do YOU think it deserves the hate?

Daily Mail - Science & tech

Critics might complain that modern pop music is soulless and artificial - but a new'AI popstar' takes that to a whole new level. Anna Indiana, a self-described AI singer-songwriter, has been ridiculed after releasing her first single. In a video posted to YouTube, Anna performs a pop song to a backing track of piano, guitar, and drums. Introducing itself, the AI explains: 'Everything from the key, tempo, chord progression, melody notes, rhythm, lyrics, and my image and singing, is auto-generated using AI.' However, music fans have not reacted well to the release, calling it'horrifying' and'unnerving'.